Assas-band, an Affix-Exception-List Based Urdu Stemmer

نویسندگان

  • Qurat ul Ain Akram
  • Asma Naseer
  • Sarmad Hussain
چکیده

Both Inflectional and derivational morphology lead to multiple surface forms of a word. Stemming reduces these forms back to its stem or root, and is a very useful tool for many applications. There has not been any work reported on Urdu stemming. The current work develops an Urdu stemmer or Assas-Band and improves the performance using more precise affix based exception lists, instead of the conventional lexical lookup employed for developing stemmers in other languages. Testing shows an accuracy of 91.2%. Further enhancements are also suggested.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Template based affix stemmer for a morphologically rich language

Word stemming is one of the most significant factors that affect the performance of a Natural Language Processing (NLP) application such as Information Retrieval (IR) system, part of speech tagging, machine translation system and syntactic parsing. Urdu language raises several challenges to NLP largely due to its rich morphology. In Urdu language, stemming process is different as compared to th...

متن کامل

Challenges in Developing a Rule based Urdu Stemmer

Urdu language raises several challenges to Natural Language Processing (NLP) largely due to its rich morphology. In this language, morphological processing becomes particularly important for Information Retrieval (IR). The core tool of IR is a Stemmer which reduces a word to its stem form. Due to the diverse nature of Urdu, developing stemmer is a challenging task. In Urdu, there are large numb...

متن کامل

A Light Weight Stemmer for Urdu Language: A Scarce Resourced Language

Stemming is a procedure that conflates morphologically related terms into a single term without doing complete morphological analysis. Urdu language raises several challenges to Natural Language Processing (NLP) largely due to its rich morphology. The core tool of information retrieval (IR) is a Stemmer which reduces a word to its stem form. Due to the diverse nature of Urdu, developing its ste...

متن کامل

Rule Based Urdu Stemmer

This paper presents Rule based Urdu Stemmer. In this technique rules are applied to remove suffix and prefix from the inflected words. Urdu is well spoken language all over the world but less work has been done on Urdu stemming. Stemmer helps us to find the root of the inflected word. Various possibilities of inflected words like ںو (vao+noon-gunna), ے (badi-ye), ںای (choti-ye+alif+noon-gunna) ...

متن کامل

Stemming in Tamil for Affix Stripping

Stemming is the one of the most important step in many of the Natural Language processing tasks. Stemming reduces inflected words to a common stem/root word. Stemming process mainly carried out in English language because Tamil language is more complex in structure and more over it consists of critical grammatical rules. Tamil is a Dravidian language, mainly spoken by Tamil. Tamil words have mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009